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ABSTRACT 


A review  is  given  for  a technique  that  relates  the  performance  of  the 
CPU  ♦-►jMain  Memory  section  of  small  conputers  to  standard  design  parameters 
such  as  wordlength,  number  of  registers,  etc.  The  technique  was  constructed 
from  execution  time  and  memory  space  data  obtained  by  applying  three  small 
benchmark  program  kernels  to  fifteen  computers'.  The  data  was  used  to  deter- 


mine regression  equations  that  provide  a best  fit  to  the  data  Time  and  space 
equations  were  developed  for  each  kernel.  Three  or  four  variables  from  the 


set  of  design  parameters  were  used  as  the  independent  variables  in  each 


These  variables  were  chosen,  in  each  case,  as  the  ones  that 


* This  report  applies  these  equations  to  both  the  AN/UYK-20  and 
AN/UYK-7  computers  to  predict  their  time  and  space  values  with  respect  to  the 
three  kernels.  The  kernels  have  been  programmed  for  these  two  machines  and 
the  actual  time  and  space  values  obtained  are  compared  to  the  predicted 
results,  as  well  as  to  the  actual  results  obtained  for  the  fifteen  computers 
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1.  Introduction 

• 

The  authors  have  developed  a technique^  based  on  regression  equations 
that  can  be  used  to  predict  relative  execution  time  and  memory  space  perfor- 
mance measures  for  the  CPU  ■*-*■  Main  Memory  section  of  a range  of  small 
computers.  The  range  is  basically  characterized  as  that  of  8-  to  24-bit 
wordlength  computers.  In  order  to  predict  the  performance  of  a particular 
computer,  only  a few  standard  design  parameters  need  to  be  known,  e.g., 
wordlength,  number  of  register.,,  byte-addressability,  etc.  The  regression 
equations  were  developed  by  applying  three  small  benchmark  program  kernels 
to  each  of  fifteen  con^mters. 

The  purpose  of  this  report  is  to  describe  the  results  of  the  development 
of  the  technique,  followed  by  an  application  of  the  equations  to  predict  time 
and  space  performance  of  the  AN/UYk-ZC.  and  AN/UYk-7  computers  on  the  three 
kernels.  The  actual  time  and  space  requirements  of  these  machines  on  the 
kernels  is  then  compared  to  the  predictions. 

In  Section  2 of  the  report,  the  technique  is  described. 

In  Section  3,  the  details  of  the  technique  are  documented,  and  the 
results  of  applying  the  technique  to  the  AN/UYK-20  and  AN/UYK-7  computers  is 
presented. 

Section  4 discusses  the  general  technique  and  offers  specific  conclusions 
that  may  be  drawn  from  the  application  of  the  technique  to  the  above  machines. 

We  do  not  review  computer  performance  methods  in  general  and  we  do  not 

attempt  to  justify  the  use  of  kernels  in  particular.  These  aspects  can  be 

[21 

found  by  consulting  articles  in  the  bibliography  compiled  by  Agajanian1  . 

The  text  edited  by  Freiberger^  contains  detailed  discussions  of  the  use  of 
statistical  techniques  in  computer  performance  evaluation.  Our  work  is  in 
the  same  spirit  as  this  text. 


a 
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2.  Performance  Evaluation  Technique 

We  postulate  that  it  should  be  possible  to  get  some  idea  of  the  relative 
execution  times  and  memory  space  requirements  for  members  of  a class  of 
computers  when  they  are  placed  in  a particular  application  environment  by 
examining  their  design  parameters,  e.g.,  wordlength,  number  of  registers, 
byte-addressability,  etc.  The  method  that  the  authors  have  developed  to 
quantify  the  relationship  is  briefly  described  as  follows: 

(a)  Choose  a class  of  computers  and  select  a representative  subset. 

[We  selected  fifteen  computers  from  the  8-  to  24-bit  wordlength  class;  so  a 
focus  on  minicomputers,  and  a little  above  and  below,  was  taken.] 

(b)  Specify,  at  the  flow-chart  level , a few  programs  that  exercise  the 
CPU  -*-*■  Main  Memory  section  of  the  machines.  [We  chose  three  small  benchmark 
program  kernels  from  the  areas  of  high  precision  arithmetic,  character 
manipulation,  and  list  processing,  respectively.  No  input/output  was 
involved. ] 

(c)  Code  all  kernels  on  all  machines  and  evaluate  the  execution  times 
in  memory  cycles  and  the  memory  space  requirements  in  bits. 

[Execution  times  were  determined  by  the  use  of  an  abstracted  trace  routine 
program.  This  tracer  was  constructed  from  the  flow  of  control  exhibited  by 
the  flowcharts.  Hand  calculated  values  for  execution  times  of  the  various 
straight  line  parts  of  the  kernels  were  input  to  the  tracer.  It  then  follow- 
ed flow  of  control  (looping  and  branching)  to  ac cumul at e^hm— total  execution 
time.  Some  uniform  assumptions  were  made  about  frequency  of  data  dependent 
branching,  as  required.] 

(d)  Choose  forms  of  equations  (regression  fit  analysis)  that  have 


standard  machine  parameters  as  the  independent  variables  and  time  (T)  and 
space  (S) , as  defined  above,  as  the  dependent  variables.  [The  forms  chosen 
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were  either: 
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The  first  of  these  we  will  call  the  multiplicative  form  and  the  second  we 
will  call  the  additive  form.  The  machine  parameter  set,  the  x^'s,  includes 
variables  such  as  wordlength,  number  of  registers,  byte-addressability,  etc. 
A total  of  six  machine  design  parameters  was  found  to  be  adequate.] 

(e)  Perform  a standard  regression  fit  of  these  equation  forms  to  the 
observed  data  in  order  to  evaluate  the  c.  and  c,  . constants.  The  least 
squares  criterion  determines  the  best  fit.  [This  results  in  six  equations, 
a T and  S equation  for  each  kernel.  Some  experimentation  was  used  to  deter- 
mine i)  the  best  form,  and  ii)  the  three  or  four  x4 's  to  use  in  each 
situation. ] 

(f)  These  equations  can  then  be  used  to  predict  relative  performance 
among  all  machines  of  the  class.  [This  report  presents  the  results  of  such 
predictions  for  the  AN/UYK-20  and  AN/UYK-7  computers,  and  the  results  of 
checking  the  predictions  by  coding  the  kernels  on  each  machine  to  determine 
actual  T and  S values.] 

3.  Results 

It  is  convenient  to  present  the  results  in  the  same  sequence  as  the 
description  of  the  methodology  in  the  previous  section. 

(a)  The  fifteen  computers  of  the  original  study ^ are  listed  in  Table 

* 

1 along  with  their  wordlengths.  The  two  AN/UYK  machines  are  also  included. 


All  tables  and  figures  referenced  in  this  section  are  collected  together  at 
the  end  of  the  section. 
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(b)  In  order  to  select  appropriate  small  benchmark  program  kernels,  it 
is  necessary  to  consider  some  application  areas  and  formulate  a small 
benchmark  program  in  each  area.  The  benchmarks  must  then  be  individually 
analyzed  in  order  to  extract  an  appropriate  kernel.  A kernel  is  taken  to  be 
a structurally  identifiable  part  of  the  benchmark  thaE  accounts  for  most  of 
the  execution  time  of  the  benchmark.  A brief  description  of  the  benchmarks 
and  kernels  follows.  The  Appendix  contains  more  details,  including  flowcharts. 

Benchmark  #1:  High  Precision  Arithmetic 

The  wordlength  of  the  machines  ranged  from  8 to  24  bits.  As  an  indica- 
tion of  their  ability  to  handle  high  precision  arithmetic,  this  benchmark 
performs  48-bit  integer  division  by  a standard  technique.  Over  90%  of  the 
execution  time  was  spent  in  three  subroutines  named  MULAD,  MULSB,  and  SHFTM. 
These  routines  constituted  the  kernel . 

Benchmark  >2:  Character  Manipulation 

The  crucial  problem  in  character  manipulation  applications  is  to  use 
storage  effectively,  and  at  the  same  time  facilitate  fast  processing.  For 
example,  in  a machine  where  the  smallest  addressable  data  unit  is  16  bits, 
two  bytes  or  characters  must  be  packed  per  data  unit  if  good  storage  effic- 
iency is  to  be  maintained.  However,  this  will  impede  the  accessing  of  a 
single  character.  Since  memory  space  is  usually  limited  in  the  minicomputer 
class,  it  was  decided  that  on  all  machines  maximum  core  packing  of  character 
strings  would  be  used  in  this  benchmark.  The  actual  processing  problem  was 
[ * the  construction  of  a file  of  records  to  be  printed.  This  print  file  was 

t extracted  from  certain  fields  of  a base  file.  A format  list  specified  the 

fields  to  be  selected.  Linear  searching  of  both  the  format  list  and  the  base 
file  were  involved  in  the  process  of  constructing  the  print  file.  A set  of 
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routines  that  accounted  for  about  65*  of  the  total  execution  time  was  chosen 
as  the  kernel  of  this  benchmark. 

Benchmark  » 3:  List  Processin- 


This  benchmark  exercises  the  ability  of  the  machines  to  handle  scattered 
data  items.  An  algorithm  for  binary  tree  insertion  and  balancing  was  used. 

A kernel  that  accounted  for  80%  of  the  execution  time  was  identified. 

(c)  In  all  benchmarks,  the  method  of  proceeding  to  the  determination 

of  an  appropriate  kernel  was  as  follows.  Flow-charts  were  constructed  for 

the  complete  benchmark  without  any  particular  machine  in  mind.  Even  though 
% 

these  flow-charts  were  reasonably  detailed,  there  was  enough  flexibility  at 
the  coding  phase  to  exploit  the  particular  strengths  of  the  instruction  set 
and  CPU  facilities  of  a given  machine.  The  benchmarks  were  actually  run  on 

ufii/  Oiic  -KZhc  rur-o/i).  mid  cduuildacu  wunc.uisds 

completeness  of  the  flow-charted  algorithms  and  facilitated  the  extraction  of 
appropriate  kernels.  These  kernels  were  then  programmed  at  the  machine 
assembly  level  by  a single  programmer,  Rannem,  on  the  other  fourteen  -achines. 
Rannem  was  also  the  programmer  for  the  AN/UYK-20  and  AN/UYK-7  experiments. 
Because  of  the  small  size  and  easy  understandability  of  the  kernels,  execution 
times  and  memory  space  requirements  were  relatively  easy  to  compute  for  these 
kernels.  As  mentioned  in  the  previous  section,  an  abstracted  trace  routine 
program  was  used  to  compute  execution  times.  Memory  space  was  recorded  in 
bits  and  execution  time  was  recorded  in  number  of  memory  cycles.  The  latter 
parameter  allows  a concentration  on  machine  design  features,  and  the  actual 
speeds  of  the  various  technologies  used  in  the  machines  have  no  effect  on  the 
results. 

The  time  and  space  values  for  all  machines  of  the  original  study,  as  well 


as  for  the  AN/UYK  machines  are  displayed  in  Table  2. 
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(d)  The  multiplicative  and  additive  forjns  of  regression  equations  have 
been  stated  in  the  previous  section.  The  dependent  variables  are  execution 
time,  T,  in  memory  cycles,  and  memory  space,  S,  in  bits.  The  final  set  of 
six  computer  design  parameters  that  were  found  to  be  adequate  for  explain- 
ing the  performance  results  are  listed  in  Table  3,  along  with  their  values  on 
each  of  the  machines.  We  list  brief  explanations  of  these  parameters  here: 

xi : Memory  Wordlength  (bits) 

The  maximum  number  of  bits  per  memory  access.  This  normally 
corresponds  to  the  basic  instruction  length  of  the  computer. 


X2:  Minimum  Bytes  per  Memory  Access 

This  identifies  whether  or  not  the  machine  has  byte  addressability. 

X3:  Add  Time  (memory  cycles) 

Most  machines  have  a number  of  ways  of  determining  an  effective 
memory  address.  We  have  standardized  on  a definition  of  add  time  as  the 
number  of  memory  cycles  needed  to  perform  the  add  instruction  with  one  oper- 
and in  a CPU  register  and  the  other  operand  in  a directly  addressed  memory 
location,  leaving  the  answer  in  the  CPU.  The  memory  cycle  needed  to  fetch 
the  instruction  is  included.  This  definition  of  add  time  means  that  X3  is 
actually  a general  parameter  that  probably  indicates  the  speed  of  execution 
of  other  binary  operations  such  as  AND,  MASK,  SUB,  etc.,  where  one  operand 
is  in  memory,  the  other  is  in- a- CPtJ~registe^  and  the  result  is  left  in  a 
CPU  register. 


X4:  Registers 

This  is  the  number  of  wordlength  CPU  registers  that  can  be  used 
generally  for  holding  both  operands/results  and  main  memory  addresses. 
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Dedicated  index  registers  and  program  counters  are  excluded  from  this  count. 

Xs:  Address  Reach  per  Memory  Word  of  Instruction  (bits) 

This  is  the  number  of  main  memory  address  bits  per  memory  word  of 
instruction.  For  example,  a 12-bit  wordlength  machine  that  can  explicitly 
name  2®  memory  locations  in  a one  word  memory  reference  instruction  has  an 
address  reach  per  memory  word  of  instruction  of  8,  (=  8/1),  while  a 16-bit 
wordlength  machine  that  can  explicitly  name  216  memory  locations  in  a two 
word  memory  reference  instructi  n also  has  an  address  reach  per  memory  word 
of  instruction  of  8 (*  16/2). 

Xfi : Address  Modification  (bits) 

The  number  of  bits  that  are  used  in  an  instruction  to  determine  the 
addressing  mode  for  accessing  an  operand.  F.xamples  are  the  indirect  bit, 
the  index  bit,  etc. 


(e)  Based  on  the  data  developed  for  the  fifteen  machines  in  the  orig- 
inal study,  as  recorded  in  Table  2,  best  fit  equations  for  both  time  and 
space  were  heuristically  determined  for  each  of  the  three  application  areas 
represented  by  the  benchmark  kernels.  This  heuristic  search  involved  trying 
both  the  multiplicative  and  additive  equation  forms  with  various  subsets  of 
machine  design  parameters  as  the  independent  variables.  This  process,  its 
accuracy,  and  its  limitations  are  discussed  in  the  next  section  of  the  report. 
Ke  present  only  the  final  results  here.  Table  4 lists  the  subsets  of 
parameters  that  were  found  to  be  the  most  significant  in  determining  the 
performance  results  in  each  benchmark  area.  In  each  case,  the  parameters 
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arm  listed  in  order  of  significance. 

The  regression  equations  are  listed  in  Table  S. 

(f)  The  equations  of  Table  5 were  determined  from  the  data  derived 
from  coding  the  kernels  on  15  computers.  We  can  now  use  these  equations 
to  predict  time  and  space  measures  for  the  AN/UYK  computers.  This  is 
done  by  substituting  the  appropriate  computer  parameter  values  from  Table  3 
into  the  equations  of  Table  5.  The  results  are  displayed  in  Table  6, 
along  with  the  actual  values  of  time  and  space  that  were  dete. mined  by  coding 
the  kernels  on  the  AN/UYK  computers.  These  actual  values  were  also  entered 
in  Table  2.  There  are  no  values  entered  for  kernel  *1  (high  precisi  n 
arithmetic)  on  the  AN/UYK-7.  This  is  because  it  is  a 32-bit  wordlength 
machine  and  the  #1  benchmark  involved  a 48-bit  integer  division.  All  other 
machines  have  a wordlength  that  divides  evenly  into  as,  «o  that  it  would  be 
misleading  to  try  to  fit  a 32-bit  wordlength  machine  to  this  program.  The 
suitability  of  including  a 32-bit  wordlength  machine  in  the  study  will  be 
discussed  in  Section  4 of  the  report. 

In  order  to  gauge  the  quality  of  the  predictions  listed  in  Table  6,  it 
is  helpful  to  examine  the  closeness  with  which  the  regression  equations 
actually  fit  the  performance  values  of  the  original  fifteen  machines.  Three 
questions  that  seem  natural  to  ask  about  the  original  fits  are: 

Q1 : Khat  is  the  range  of  actual  values  of  each  of  the  performance 
parameters  for  each  of  the  kernels  over  all  fifteen  computers? 

Q2:  Khat  is  average  error  in  prediction  over  all  fifteen  computers 
with  respect  to  each  performance  parameter  for  each  kernel? 

Q3:  Khat  is  the  range  of  error  in  prediction  over  all  fifteen  computers 
with  respect  to  each  performance  parameter  for  each  kernel? 
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Table  7 answers  these  questions.  All  prediction  errors  are  stated  as 
percentages,  calculated  as  follows: 


I PREDICTION  - ACTUAL | 
ACTUAL 


x 100 


%ERR0R 


Another  useful  form  for  the  presentation  of  a regression  fit  is  via  a 
scatter  diagram  of  the  experimental  data  on  which  the  family  of  curves 
generated  by  the  regression  equation  is  superimposed.  Six  of  these  diagrams 
ave  presented  in  Figures  1 through  6,  one  for  each  performance  parameter  for 
each  kernel.  The  format  of  all  of  these  diagrams  is  the  same.  A perfor- 
mance measure  is  the  ordinate,  and  the  most  significant  machine  design 
parameter  for  that  measure  is  the  abscissa  This  design  parameter  is  listed 
first  in  Table  4.  Holding  the  other  significant  parameters  at  appropriate 
constant  values  then  generates  the  family  of  curves.  These  values  cover 
the  range  that  occurred  in  the  fifteen  computers  of  the  original  study. 
Therefore,  the  extent  to  which  the  curves  span  the  space  occupied  by  the 
scatter  points  is  a visual  indication  of  the  quality  of  the  fit.  The 
unlabelled  X's  represent  the  data  from  the  original  fifteen  machine  study. 

The  predicted  and  actual  values  for  each  of  the  AN/UYK  computers  are  labelled. 


TABLE  1 
Computers 

Manufacturer 


Varian  Data  Machines 
General  Automation 
Interdata 

Computer  Terminal  Corporation 
Digital  Equipment  Corporation 
Honeywell 

Digital  Equipment  Corporation 
Data  General  Corporation 
Modular  Computer  Systems 
DataMate  Computer  Systems 
A/S  Kongsberg  Vaapenfabrikk 
Datacraft  Corporation 
Systems  Engineering  Laboratories 
General  Electric 

Norwegian  Defence  Research  Establishment 


Sperry  Univac 
Sperry  Univac 


TABLE  2 

Time  (memory  cycles)  and  Space  (memory  bits)  Results 
ter  I Kernel  #1  I Kernel  #2  I Kernel  #3 


Varian  520/i 
SPC-12 
Interdata  1 
Datapoint  2200 
PDP-8/I 
H 112 
PDP- 11/20 
Supernova 
Modcomp  III 
DataMate 
Kongsberg  400 
DC  6024 
SEL  804A 
GE-PAC  4010 


S 


AN/UYK-20 

AN/UYK-7 


Time 

EH 

Time 

H 

Time 

23866 

1576 

1890 

1352 

19633 

33462 

2064 

2123 

1752 

23043 

41532 

3016 

2223 

1456 

50698 

49345 

2104 

3086 

1664 

41294 

11448 

1224 

3627 

1776 

19305 

23475 

1452 

7027 

1880 

29767 

6071 

944 

1597 

1584 

10796 

5460 

1072 

2818 

2192 

10000 

4616 

1120 

1012 

1520 

7256 

6296 

1328 

21>33 

1872 

9440 

4809 

1072 

2515 

1872 

7805 

1365 

768 

936 

1968 

5778 

1576 

936 

4989 

3024 

8942 

2962 

1248 

6793 

3456 

12615 

1535 

864 

4640 

2904 

7288 

52 

not  applicable 


Computer  Design  Parameter  Values 


Computer 

Memory- 
Word - 
length 
(bits) 

Minimum 

bytes/ 

memory 

probe 

Add 

Time 

(memory 

cycles) 

Registers 

Address 

Reach 

per  memory 
word  of 
instruction 

Address 

modifi- 

cation 

(bits) 

X1 

*2 

*3 

x4 

X5 

fMr 

Varian  520/i 

8 

1 

3 

7 

5 

3 

SPC-12 

8 

1 

5 

5 

6 

0 

Ii.^erdata  L 

8 

1 

3 

1 

4 

1 

Datapoint  2200 

8 

1 

7 

S 

3.2 

0 

PDP-8/I * 

12 

2 

2 

1 

8 

1 

H 112 

12 

2 

4.S 

1 

8 

1 

• PDP-11/20 

16 

1 

4.2 

6 

8 

3 

Supernova 

io 

2 

3 

0 

** 

S 

*9 

Modcorap  III 

16 

1 

3 

15 

8 

4 

DataMate 

16 

2 

2 

2 

8 

3 

Kongsberg  400 

16 

2 

2 

6 

8 

3 

DC  6024 

24 

1 

2 

5 

15 

3 

SEL  804A 

24 

3 

2 

5 

IS 

3 

GE-PAC  4010 

24 

3 

2 

1 

IS 

4 

SAM 

24 

3 

2 

10 

14 

4 

Important  Parameters  for  Predicting  Performance  Measures 
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Time  T=  1.35  * 105(xi*-0  343)  (x^0 • 88S)  (x3° • 497) 

Space  S = 6140  - 658x5  + 33.3x52  - 149x4  + 6.83x42  - 252xG  + 41.7x6 


TABLE  6 


Predicted  and  Actual  Performance  Values  for  the  AN/UYK  Computers 


Predicted 


Actual 


Error  in 


Prediction 


88 

5202 

114 

992 

02 

- 922 

30 

1040 

(not  applicable  on  this  benchmark) 


TABLE  7 

Relative  Quality  of  AN/UYK  Predictions 


Kernel  Measure 

# 


Range  of 
Actual  Values 


Range  of 
Error  in 


Average  Error 
in  Prediction 


over  original 

15  computers 

Prediction 
over  original 
15  computers 

over  original 
15  computers 

Time 

1,365  49,345 

2 ■>  26% 

10% 

Space 

768  -*•  3,016 

5 -*>  37% 

14% 

Time 

936  7,027 

0 -*•  35% 

11% 

Space 

1,352  -►  3,456 

0 -*  13% 

6% 

Error  in  AN/UYK 
Predictions 


not 

applicable 


wordlength  (xj)  •* 


Figure  4:  Regression  F.quation  Plots  for  Space  Performance  on  Kernel  »1 


1 


2 


3 


bytes/probe  (x2)  -*■ 

Figure  5:  Regression  Equation  Plots  for  Space  Performance  on  Kernel  #2 


Figure  6:  Regression  Equation  Plots  for  Space  Performance  on  Kernel  *3 
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4.  Discussion,  Use  and  Limitation  of  Results 

In  this  section,  we  will  first  discuss  some  specific  conclusions  that 
can  be  drawn  regarding  the  AN/UYK  computers.  This  discussion  will  include  a 
listing  of  the  changes  that  result  in  the  regression  equations  when  the 
AN/UYK-20  is  included  with  the  original  fifteen  computers.  The  general  use 
of  the  results,  in  particular  the  use  of  the  six  Figures  of  Section  3,  will 
then  be  sketched.  Finally,  some  limitations  of  the  statistical  technique 
itself  will  be  mentioned. 

4.1  Conclusions  Regarding  the  AN/UYK-20  and  AN/UYK-7  Computers 

We  begin  with  a summary  evaluation  of  the  AN/UYK-20  in  each  benchmark 
area,  as  compared  to  the  other  five  16-bit  wordlength  machines: 

(a)  The  Ai\yUYK-20  on  Kernel  #1  (high-precision  arithmetic) 

The  range  of  execution  times  for  the  six  16-bit  wordlength  machines  on 
this  kernel  is  4616  -*•  6296  (see  Table  2).  The  AN/UYK-20  is  third  best  at 
5202,  slightly  better  than  the  average  time  of  5409.  We  should  note  that 
only  ADD  and  SUBTRACT  arithmetic  instructions  were  used  in  this  benchmark 
in  performing  division.  Therefore,  whether  or  not  a machine  has  multiply 
and/or  divide  instructions  has  no  bearing  on  the  results.  The  space  range 
among  16-bit  machines  is  944  -*■  1328,  with  an  average  of  1088.  The  AN/UYK-20 
is  thus  also  above  average  on  this  measure,  ranking  second  at  992. 

(b)  The  AN/UYK-20  on  Kernel  #2  (character  manipulation) 


As  can  be  guessed,  the  machines  that  have  byte  addressing  capability 
perform  best  here.  The  time  and  space  ranges  for  16-bit  wordlength  machines 
are  922  -+•  2818  and  1040  ->  2192,  with  averages  of  2165  and  1680,  respectively. 
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The  AN/UYK-20  is  best  among  the  16-bit  computers  on  both  performance  measures 
on  this  kernel.  The  time  performance  was  reasonably  predicted  by  the 
regression  equatipns  (see  Tables  6 and  7)  but  the  AN/UYK-20  is  anomolously 
good  in  space  performance.  Our  prediction  was  high  by  37* , whereas  the 
average  prediction  error  was  6*  and  the  range  was  0 -*-13%  over  all  “ifteen 
computers  in  the  original  study.  A possible  reason  for  the  relatively  good 
performance  of  the  AN/UYK-20  on  this  kernel,  that  is  not  accounted  for  by 
o»  ~ equation  variables,  is  the  existence  of  a small  (4-bit)  immediate 
operand  field  in  one  of  the  instruction  formats.  Use  of  this  instruc- 
tion typt  has  lead  to  increased  coding  space  efficiency  in  many  parts  of  this 
kernel. 

(c)  The  AN/UYK-20  on  Kernel  #3  (list  processing) 

The  time  and  space  ranges  for  the  16-bit  uurulengtu  computers  are 
7,256  > 10,796  and  1,808  >2,416,  with  averages  of  8813  and  2079,  respectively. 
The  AN/UYK-20  is  again  well  above  average,  ranking  second  on  time  (7,580)  and 
first  on  space  (1,808). 

An  overall  ranking  of  the  six  16-bit  wordlength  computers  derived  by 
summing  the  time  and  space  values  for  each  computer  over  all  kernels  is: 


Total  Time,  T Total  Space,  S 

(over  3 kernels)  (over  3 kernels) 


Modcomp  III 

12,884 

(1) 

4,544 

(2) 

AN/UYK-20 

13,704 

(2) 

3,840 

(1) 

Kongsberg  400 

15,129 

(3) 

4,960 

(4) 

DataMate  16 

18,269 

(4) 

5,440 

(5) 

Supernova 

18,278 

(5) 

5,680 

(6) 

PDP-11/20 

18,464 

(6) 

4,616 

(3) 
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If  we  take  the  Time  * Space  product,  the  ranking  becomes: 

T x S 
(107) 

AN/UYK-20  5.262 

Modcomp  III  5.854 

Kongsberg  400  7.504 

PDP-11/20  8.523 

DataMate  16  9.938 

Supernova  10.380 

Since  the  AN/UYK-7  is  the  only  32-bit  wordlength  computer  in  the  study, 
the  only  point  of  interest  is  how  well  the  regression  equations  actually 
predicted  its  performance.  The  equations  cannot  be  expected  to  be  very 
accurate  when  applied  outside  of  the  range  of  parameters  for  which  they  v;ere 
derived.  However  the  T>T“d’  Prmrs  fnr  rt(>  AN/irYK-7  arp  nor  imrpa^nnahl  t3  « 

when  compared  with  the  range  of  prediction  errors  over  the  original  fifteen 
computers  (see-  Table  7) . We  should  note  that  the  execution  times  for  the 
AN/UYK-7  do  not  account  for  the  fact  that  overlapped  memory  bank  accessing  is 
possible  in  this  machine,  since  the  program  can  be  stored  in  one  memory  bank, 
and  the  data  in  another.  Examination  of  the  two  kernels  on  which  the  AN/UYK-7 
was  evaluated  indicate  that  execution  times  can  be  reduced  by  an  average  of 
about  23%  if  instruction  and  data  fetching  are  overlapped. 

Finally,  as  an  indication  of  the  sensitivity  of  the  regression  equation 
constants  to  a change  in  the  sample  data  points,  we  have  added  the  data  from 
the  AN/UYK-20  to  the  data  from  the  original  fifteen  computers  and  recomputed 
the  equations.  The  new  equations,  along  with  the  constants  from  the  old 
ones  (in  parentheses)  are: 
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(2.29)  (-2.26)  (-0.276)  (0.660) 

T#1  - 2.25  x I06(xf2*27)(x,,‘0-25t,)(x30-638) 

(1.01)  (-0.720)  (-0.109) 

S*1  - 1.01  x 10w(x,-0-720)(x4-0- ni) 

(1780)  (l-**0)  (0.687)  (-0.155)  (-0.290) 

T#2  = 1830(X21*50)(X30-693)(X4'°-168)(X1-0*301*) 

(2130)  (-964)  (385)  (2.97)  (0.807)  (-18.7)  (-0.296) 

S»2  = 2110  - 1020x2  ♦ 403x2 2 ♦ 9.0Sx*  ♦ O.S87xi2  - 0.0064x.  - 2.26x42 

(1.35)  (-0.363)  (-0.885)  (0.697) 

T#3  = 1.35  x 105(x1|-0-3‘*5)(xr0-885)(x30.‘*97) 

(6140) (-658)  (33.3)  (-149)  (6.83)  (-252)  (41.7) 

S»3  = 6110  - 654x5  ♦ 33.1x52  - 268x4  ♦ 46.9x42  - 138x6  ♦ S.60x62 


4 . 2 General  Conclusions 

The  overall  tendency  of  the  performance  curves  given  in  Figures  1-6 
suggests  the  following: 

1.  In  the  range  of  applications  represented  by  the  benchmark  programs 
in  this  study,  the  optimum  wordlength  is  around  16  bits.  Longer  wordlengths 
cannot  be  used  efficiently,  particularly  in  character-oriented  applications. 
Shorter  wordlength  results  in  a limited  address  reach.  This  leads  to  an 
increase  in  addre-.sing  overhead  because  of  the  use  of  such  techniques  as 


on  performance  most  clearly,  and  Figure  6 shows  an  "optimum"  space  result 
when  address  reach  is  around  8-10.  It  should  be  noted  that  address  reach 
is  strongly  correlated  with  wordlength.  The  range  8 - 10  in  address  reach 


corresponds  to  a wordlength  range  of  16  to  18  bits. 

2.  The  number  of  general  purpose  registers  has  a pronounced  effect  on 
performance,  as  can  be  seen  in  Figure  3.  However,  increasing  the  number  of 
registers  beyond  about  6 or  8 seems  to  have  very  little  effect.  Possibly, 
the  programmer  cannot  make  effective  use  of  a larger  number.  While  it  may  be 
argued  'teat  the  threshold  is  programmer-  and  program-dependent,  we  do  not 
believe  that  a substantial  change  in  performance  can  be  achieved  by  increas- 
. ing  the  number  of  registers  beyond  6 or  8. 

In  what  follows,  we  present  a possible  general  interpretation  for  the 

results  of  this  study.  The  design  parameters  used  as  independent  variables 

can  be  regarded  as  the  "raw  material"  for  the  computer  design  process.  The 

designer  has  to  make  optimum  use  of  these  parameters  to  maximize  performance 

which  is  measured  by  the  memory  space  and  execution  time  of  the  benchmark 

programs.  This  interpretation  is  consistent  with  the  fact  that  values  for 

most  of  the  independent  variables  used  are  likely  to  be  chosen  relatively 

early  in  the  design  process.  These  variables,  for  example,  do  not  include 

any  reference  to  the  specifics  of  the  instruction  set  of  the  machine,  with 

the  exception,  perhaps,  of  the  number  of  bits  devoted  to  address  modification, 

x,.  Therefore,  it  can  be  concluded  that  the  curves  obtained  from  the  re- 
o 

gression  analysis  represent  fundamental  tendencies  dictated  by  these 
variables.  On  the  other  hand,  the  "scatter"  of  actual  performance  figures 
relative  to  these  curves  represents  the  variations  among  different  designs 
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that  use  the  same  raw  material.  This  suggests  that  the  distance  between 
actual  performance  points  and  the  curves  is  a measure  of  the  success  of  the 
designer  in  optimizing  the  details  of  the  instruction  set.  In  this  sense, 
the  small  amount  of  scatter  in  most  of  the  figures  given  in  this  report 
indicates  a rather  surprising  degree  of  uniformity  in  the  design  of  commer- 
cially available  computers.  Alternatively,  it  can  be  regarded  as  an 
indication  that  this  aspect  of  the  design  process  is  close  to  being  optimum. 


4.3-  Limitations  of  the  Statistical  Technique 


Thivs  important  limitations  to  the  statistical  techniques  used  in  this 


report  are: 


1.  Small  sample  size. 


r w jl  wiiv  i\  N.  j 


3.  Programmer  variability. 


Fifteen  machines  were  used  in  the  original  study* 


These  machines  were 


selected  rather  arbitrarily.  The  main  factor  was  simply  the  availab-' lity  to 
the  authors  of  adequate  information.  The  curves  derived  are  representative 
of  true  tendencies  only  to  the  extent  that  these  fifteen  machines  are 
typical  of  the  class  of  8 - 24  bit  computers.  The  authors  feel,  however, 
that  16-bit  machines  have  been  reasonably  well  represented. 

The  second  important  point  is  the  choice  of  the  benchmarks  and  the  size 
of  the  kernels.  The  kernels  required  about  100  machine  instructions  on 

16-bit  computers.  It  may  be  argued  that  these  kernels  are  not  large  enough 
to  exhibit  the  relative  merits  of  the  machines  in  the  environment  of  much 
larger  programs.  This,  of  course,  is  a general  limitation  of  the  kernel 
approach.  Because  of  the  fundamental  nature  of  the  parameters  used  in  this 
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study  we  feel  that  the  general  tendencies  exhibited  in  the  performance  curves 
are  not  likely  to  change  with  kernel  size. 

In  order  to  test  programmer  variability  the  three  kernels  were  coded  on 
one  of  the  machines  by  a different  programmer.  An  average'  variation  of 
the  order  of  20%  was  observed  in  space  requirements.  “In  addition  to 
variability  among  programmers,  we  should  also  recognize  a second  factor, 
namely,  that  a single  programmer  may  not  be  equally  familiar  with  all 
machines.  There  was  an  attempt  made  to  minimize  this  effect  in  the  present 
study  by  rechecking  all  programs  after  complete  initial  coding. 
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APPENDIX:  Flow-chart  Listings 

This  appendix  includes  complete  flow-chart  documentation  for  benchmark 
• 1,  and  the  main  program  flow-charts  for  the  other  two  benchmarks.  The  main 
purpose  of  providing  these  flow-charts  is  to  give  the  reader  a feeling  for 
the  level  of  complexity  of  the  kernels.  The  listing  Includes: 


Benchmark  *1  Page 

Main  program  description  A-2 

Main  flow-chart  A-3 

MU  LAD  routine  A- 4 

MULSB  routine  A-5 

SHIFTM  routine  A-6 

LOMSB  routine  A- 7 

Benchmark  »2 

Main  program  description  A-8 

Main  flow-chart  A-9 

Benchmark  #3 

Main  program  description  A- 10 

Main  flow-chart  A-ll 


The  kernel  for  benchmark  #1  consists  of  the  routines  MULAD,  MULSB,  and 
SHFTM.  The  kernels  for  the  other  two  benchmarks  are  of  the  same  level  of 
size  and  complexity  as  that  of  benchmark  #1. 


PURPOSE : 


INPUT; 


OUTPUT: 


ROUTINES ; 


Benchmark  Program  No.  1 


Divide 


To  perform  division  between  two  multiple 
precision  (48  bits)  unsigned  numbers; 

C ■ A/B.  (A,  B end  C can  be  anywhere  in 
core  memory.)  * - 


Address  of  dividend  (A). 
Address  of  divisor  (B) . 
Address  of  quotient  (C) . 


Quotient  in  C. 
Remainder  In  A. 
Divisor  in  B. 


MULAD,  MULSB,  SHFTM,  LOMSB. 


Enter 


TChN  * o 


a\ a 

V 


Tenr  i » w »f  w.>  I 

pM.Ko.ll  g 

Mi«  «#  6 Ai<*  Ml 

I b»'f  »f  B. 

(Cft.ll  LOM3.8) 


Tfenf  0 » ir  »f  bif 

MSH.f  A An.-i  MS 
I b.»of  A. 

(C<UI  LOti^z) 


Di  vn  Si  on  bui  O 


T*Mn  * 
T«Mpl  + l 


Tempi  * 

TfSMPI  - TEMPO 

1 

f 

swi-fi  B 

k*t  TEMPI  ± of  ] 

P»Si-*i»*iS 

. (Cft.ll  SHrTM)  1 

_ . . . 

jJ 

V 

6 t O 


J Ixiroltiil  cl«Ar>*>n  N ntoiocji 
• (n*4$,  Icarfll  I 

| i-e.  cm  «Up<„  Jt  «t  (oof  I 


(Call  MlU-SB) 


rA<e 


AsA  + 3 (Cftll  Mtu-A o) 
T6MP0»0 


, TEMPO  * I 


SVi'^K  ^MOT'tiit  i C i I posifiait 

l«ff  C f«.ll  SMFTM') 
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PPRPOSE i To  print  selected  fields  (actual  I/O  not  performed) 


of  a batch  of  characacter-records  as  specified  by 
m format  program  and  a list  of  field  numbers. 

Record  specifications;  80  characters  packed  and 
represented  in  ASCII  or  "6-bit  internal  ASCII" 

(2  most  significant  bits  chopped  off). 

Format  Program  specifications:  Variable  length 
string  specifying  a maximum  of  40  alpha  and/or 
numeric  fields.  The  first  character  ■•£  an  alpha 
field  is  represented  by  "A".  Subsequent  characters 
are  represented  by  "B"  or  "nB",  where  "n"  is  a one 
or  two  digit  decimal  number.  The  first  character 
of  a numeric  field  is  represented  by  "N".  Sub- 
sequent characters  are  represented  by  "M"  or  "nM" , 
where  "n"  is  as  above.  The  string  is  terminated 
by  "X". 

List  specifications:  The  list  contains  maximum 
40  one  or  two  digit  decimal  numbers.  Legal  numbers 
are  1 to  AC . The  list  has  been  screed  in  esccnsing 
numeric  order.  The  first  item  contains  the  length 
of  the  list. 


INPPT : 

OUTPUT:  The  record  batch  is  printed  according  to  specifi- 

cations . 


ROUTINES:  PRIFL. 
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Build  a Balanced  Binary  Tree 


PURPOSE : To  insert  information  itema  into  a binary  tree 

and  balance  the  tree  after  each  insertion. 

Structure  of  tree:  The  tree  has  a dummy  node 
(called  ROOT)  serving  as  pointer  to  the  root  of 
the  tree.  The  information  in  this  node  is  a 
high  number  (usually  the  highest  positive  number 
that  can  be  represented  .within  the  information 
area  of  the  node).  This  means  that  when  the  tree 
is  not  empty,  the  root-pointer  node  (ROOT)  has  a 
valid  left  link  and  a null  right  link  The  tree 
is  unthreaded. 

Structure  of  nodes; 


Left  link 

Information 

Right  link 

Balance 

(ILNK) 

(INFO) 

(RLNK) 

indicator 

(IND) 

? 1 ^ ^ f 

available 
data  format 
* 2 bits. 

Must  be  able 
to  store  and 
differentiate 
between  LEFT 
(L)  , RIGHT 
(R)  and 
BALANCED  (B). 


INPUT: 


OUTPUT: 


Length  of 
address  word 
of  particular 
CPU. 
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more ) . Enough 
to  hold  an 
ASCII 

character. 


Length  of 
address  word 
of  particu- 
lar CPU. 


ROUTINES : INSRT,  BLNCE 


